Fix BigInteger char parsing with Ukrainian culture NBSP handling (bidirectional)#123783
Fix BigInteger char parsing with Ukrainian culture NBSP handling (bidirectional)#123783
Conversation
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
|
Consider your design. Why is it correct and the simplest form of the solution? Are there other possibilities? I believe you consisdered the following. Why did you pick something different? private static unsafe TChar* MatchChars<TChar>(TChar* p, TChar* pEnd, ReadOnlySpan<TChar> value)
where TChar : unmanaged, IUtfChar<TChar>
{
Debug.Assert((p != null) && (pEnd != null) && (p <= pEnd));
fixed (TChar* stringPointer = &MemoryMarshal.GetReference(value))
{
TChar* str = stringPointer;
if (TChar.CastToUInt32(*str) != '\0')
{
// We only hurt the failure case
// This fix is for French, Kazakh, and Ukrainian cultures. Since a user cannot type 0xA0 or 0x202F as a
// space character we use 0x20 space character instead to mean the same.
// We also need to handle the reverse case where the input has 0xA0 or 0x202F and the format string has 0x20.
while (true)
{
uint cp = (p < pEnd) ? TChar.CastToUInt32(*p) : '\0';
uint val = TChar.CastToUInt32(*str);
if ((cp != val) && !((IsSpaceReplacingChar(val) && (cp == '\u0020')) || (IsSpaceReplacingChar(cp) && (val == '\u0020'))))
{
break;
}
p++;
str++;
if (TChar.CastToUInt32(*str) == '\0')
{
return p;
}
}
}
}
return null;
}
}
``` |
Thank you for the question! I tested your simpler solution (commit 9f5c3ae5) and you're right that it's more elegant. However, both approaches have the same limitation: The core issue:
My complex UTF-8 handling attempted to fix this but introduced bugs with pointer advancement. Your simpler bidirectional check works perfectly for char (UTF-16) but not for UTF-8, which is why the original outer loop test still fails with both approaches. The simpler solution is correct and I've adopted it (removes ~70 lines of buggy code). The remaining UTF-8 issue needs a different approach - likely at the point where |
…-8 handling) Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Done! I've moved the two Ukrainian culture tests into the existing |
…llowTrailingWhite
|
/azp run runtime-libraries-coreclr outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@copilot is this related failure to the change here? |
... This is the original issue that the PR partially addresses. The failure is in UTF-8 parsing (line 828: |
|
Forgot to check results - yes. It's related.
…________________________________
From: Tarek Mahmoud Sayed ***@***.***>
Sent: Tuesday, February 10, 2026 10:44 AM
To: dotnet/runtime ***@***.***>
Cc: Art Leonard ***@***.***>; Assign ***@***.***>
Subject: Re: [dotnet/runtime] Fix BigInteger char parsing with Ukrainian culture NBSP handling (bidirectional) (PR #123783)
[https://avatars.githubusercontent.com/u/10833894?s=20&v=4]tarekgh left a comment (dotnet/runtime#123783)<#123783 (comment)>
@copilot<https://gh.io/copilot-coding-agent-docs> is this related failure to the change here?
{ "HelixJobId": "bd8a8479-7771-47b4-bfb6-9b29c8662936", "HelixWorkItemName": "System.Runtime.Numerics.Tests" }
Error message
System.FormatException : The value could not be parsed.
Stack trace
at System.Number.ThrowOverflowOrFormatException(ParsingStatus status) in /_/src/libraries/System.Runtime.Numerics/src/System/Number.BigInteger.cs:line 29
at System.Number.ParseBigInteger[TChar](ReadOnlySpan`1 value, NumberStyles style, NumberFormatInfo info) in /_/src/libraries/System.Runtime.Numerics/src/System/Number.BigInteger.cs:line 136
at System.Numerics.BigInteger.Parse(ReadOnlySpan`1 utf8Text, NumberStyles style, IFormatProvider provider) in /_/src/libraries/System.Runtime.Numerics/src/System/Numerics/BigInteger.cs:line 707
at System.Numerics.Tests.parseTest.VerifyParseSpanToString(String num1, NumberStyles ns, Boolean failureNotExpected, String expected) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 828
at System.Numerics.Tests.parseTest.VerifyParseToString(String num1, NumberStyles ns, Boolean failureNotExpected, String expected) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 884
at System.Numerics.Tests.parseTest.VerifyParseToString(String num1, NumberStyles ns, Boolean failureNotExpected) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 818
at System.Numerics.Tests.parseTest.VerifyNumberStyles(NumberStyles ns, Random random) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 689
at System.Numerics.Tests.parseTest.RunParseToStringTests(CultureInfo culture) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 52
at InvokeStub_parseTest.RunParseToStringTests(Object, Span`1)
at System.Reflection.MethodBaseInvoker.InvokeWithOneArg(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) in /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs:line 95
—
Reply to this email directly, view it on GitHub<#123783 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACLGGM2BA66I6ITKWB2ISFD4LIRH7AVCNFSM6AAAAACTMH5XXSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQNZZHE4TCMBRGQ>.
You are receiving this because you were assigned.Message ID: ***@***.***>
|
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
🤖 Copilot Code Review — PR #123783Holistic AssessmentMotivation: The PR addresses a real test failure with Ukrainian culture (uk-UA) parsing in BigInteger where the test generates numbers with regular spaces but the culture's NumberGroupSeparator is NBSP (U+00A0). The linked issue #120283 confirms this is a genuine problem blocking CI. Approach: The fix normalizes both input characters and format characters to a common space (U+0020) before comparison, making the equivalence bidirectional. This is a reasonable approach that aligns with how Summary: Detailed Findings
|
|
/azp run runtime-libraries-coreclr outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Fix for BigInteger char parsing with Ukrainian culture NBSP handling (bidirectional)
Fixes #120283
Issue
BigInteger parsing fails with cultures that use NBSP (U+00A0) as
NumberGroupSeparator(like Ukrainian uk-UA) when users type regular space (U+0020).Solution
Modified the
MatchCharsfunction inNumber.Parsing.Common.csto handle bidirectional space-replacing character equivalence:NormalizeSpaceReplacingCharhelper that converts NBSP/narrow NBSP to regular spaceNormalizeSpaceReplacingChar(cp) != NormalizeSpaceReplacingChar(val)Testing
ParseWithNBSPAsGroupSeparator()test in existingparse.csfileImpact
This fix only impacts BigInteger parsing. Other numeric types (Complex, double, float, decimal) use different parsing infrastructure in System.Private.CoreLib that doesn't include the common parsing file.
Known Limitations
This fix addresses char (UTF-16) parsing only. UTF-8 parsing issues are tracked separately in #124016.
Progress
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.